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Abstract. This paper addresses the problem of finding cycles in the state tran- 
sition graphs of synchronous Boolean networks. Synchronous Boolean networks 
are a class of deterministic finite state machines which are used for the modeling 
of gene regulatory networks. Their state transition graph cycles, called attractors, 
represent cell types of the organism being modeled. When the effect of a disease 
or a mutation on an organism is studied, attractors have to be re-computed ev- 
ery time a fault is injected in the model. We present an algorithm for finding 
attractors which uses a SAT-based bounded model checking. Novel features of 
the algorithm compared to the traditional SAT-based bounded model checking 
approaches are: (1) a termination condition which does not require an explicit 
computation of the diameter and (2) a technique to reduce the number of addi- 
tional clauses which are needed to make paths loop-free. The presented algorithm 
uses much less space than existing BDD-based approaches and has a potential to 
handle several orders of magnitude larger networks. 
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1 Introduction 

A gene regulatory network (GRN) is a collection of DNA segments in a cell, called 
genes, which interact with each other [1]. Each gene contains information that deter- 
mine what the gene does and when the gene is active, or expressed. When a gene is 
active a process called transcription takes place, producing an ribonucleic acid (RNA) 
copy of the gene's information. This piece of RNA can then direct the synthesis of pro- 
teins. RNA or protein molecules resulting from the transcription process are known as 
gene products. 

Mathematical models of GRNs have been developed to capture the behavior of or- 
ganisms being modeled. Common GRN models include ordinary and partial differential 
equations, Boolean networks and their generalizations, Petri nets, Bayesian networks, 
stochastic equations, and process calculi [2]. There is always tension between general- 
ity and level of details, and thus tractability, of a model. The most appropriate mathe- 
matical framework can be selected depending on the scale involved, the nature of the 
available information, and the problem studied. In this paper, we consider the Boolean 
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network model, which has been shown useful for exploring GRNs in the context of cel- 
lular differentiation, cell cycle regulation, immune response, and evolution (see [3] for 
an overview). 

The Boolean network is a discrete-space discrete-time model in which every gene 
is viewed as a vertex whose input values represent gene products and output values 
represent the level of gene expression [4]. Th edges between vertices represent the in- 
teractions between genes. These interactions can be activatory, with an increase in the 
concentration of gene products in one gene leading to an increase in the level of gene ex- 
pression in other gene, or inhibitory, with an increase in one leading to a decrease in the 
other. The nature of influence of regulators on a gene is reflected by the Boolean func- 
tion assigned to the vertex. In this paper we consider the synchronous type of Boolean 
networks in which the values of functions of all vertices are updated simultaneously at 
each time step. 

Synchronous Boolean networks can be considered as a class of deterministic finite 
state machines. Any sequence of consecutive states of a network eventually converges to 
either a single state, or a cycle of states, called attractor. Attractors represent the pattern 
of gene expressions in the corresponding cell types of the organism being modeled [5]. 
When the effect of a disease or a mutation on an organism is studied, attractors have to 
be re-computed every time a fault is injected in the model [3]. 

All algorithms for computing attractors in Boolean networks face a state-space ex- 
plosion problem that must be addressed to handle large-scale models. A common ap- 
proach to combat it is to use symbolic algorithms which avoid building the state tran- 
sition graph describing the dynamic behavior of a GRN. Instead, the state transition 
graph is represented implicitly by means of Binary Decision Diagrams (BDDs) [6]. Al- 
gorithms based on BDDs [7-9] are usually able to process GRN models with up to a 
hundred of state variables. However, for larger networks, BDDs become too memory- 
consuming. Simulation-based approaches [10-12] can be applied to large networks, 
however, they are incomplete. 

Propositional decision procedures (SAT) do not suffer from the potential space ex- 
plosion of BDDs and can handle propositional satisability problems with thousands of 
variables [13]. This work is the first step in applying SAT procedures to computing at- 
tractors. The presented approach is based on SAT-based bounded model checking [14]. 
We use a SAT-solver for identification of paths of a particular length k in the state 
transition graph of a Boolean network. First we generate a propositional formula rep- 
resenting an unfolding of the transition relation of the network for k steps. A satisfying 
assignment to this propositional formula corresponds to a valid path in the state transi- 
tion graph. The process is repeated iteratively for larger and larger values of k until all 
attractors are identified. 

Note that systems which we are dealing with are deterministic and therefore the 
problem we are addressing is simpler than the traditional bounded model checking. 
This allows us to use a simple termination condition which does not require an explicit 
computation of the diameter and also to reduce the number of additional clauses which 
are needed to make paths loop-free. 

This paper contributes to the ongoing work on finding attractors by providing a 
complete solution which uses much less space than BDD-based approaches and thus 




has a potential to handle several orders of magnitude larger networks. The existing 
simulation- and BDD-based algorithms for finding attractors are only applicable to sim- 
ple organisms such as the yeast, a flower, or a fruit fly. The presented approach opens a 
possibility for exploring much more complex organisms including humans. 

The paper is organized as follows. Section 2 gives a background on synchronous 
Boolean networks. In Section 3 we describe the intuitive idea behind the presented 
approach. Section 4 presents the algorithm. Section 5 summarizes experimental results. 
Section 6 concludes the paper and discusses open problems. 



2 The Boolean Network Model 



In the Boolean network model of GRNs [4], every gene is represented by a vertex ; in 
a directed graph with an associated state variable x\ that takes the value 1 if the gene is 
expressed and otherwise. An edge from one vertex to another indicates that the former 
gene regulates the latter. Each vertex i also has an associated Boolean function /, which 
reflects the nature of influence of its regulators. 

Time is viewed as proceeding in discrete steps. For the synchronous type of update, 
the expressions of all genes are changed simultaneously. At each time step, the next 
value of the variable x, is determined by current values of r regulators of i as: 

X t = fi i X h > X h > • • • ! X 'rj ) ' (X) 

where jcj, ,x; 2 , . . . ,x, r are state variables associated to the regulators of Xi. 

The state of the network is defined as an ordered «-tuple of values of state variables 
Xi,X2,---,x n at a particular moment of time. Since a synchronous Boolean network is 
deterministic and finite, any sequence of its consecutive states eventually converges to 
either a single state, or a cycle of states, called attractor. 

An example of a Boolean network with 3 vertices is shown on the left-hand side 
of Figure 1 . Arrows indicate activatory regulation and blunt-ends represent inhibitory 
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regulation The following Boolean functions are associated to vertices: 

fl = A (x\ Vx%) 

/ 3 = -* 3 V(*i Ai 2 ) 

where "A", "V", and "-<" stand for the Boolean AND, OR and NOT operations, respec- 
tively. The State Transition Graph (STG) of this network is shown in the right-hand side 
of Figure 1. It has two attractors of length two. 

The Boolean network model has been extended to a multiple-valued one, in which 
each variable can take m values rather than only two [15, 16]. Since each m-valued 
variable can be encoded by |~log 2 m] Boolean variables, any multiple-valued network 
can be translated to a Boolean one and treated using the presented approach with no 
conceptual difference. 

3 Intuitive Idea 

The presented algorithm searches for a path of a given length k in the STG of a Boolean 
network. If a path is found, we check whether it contains a loop or not. Since each state 
of the STG of a Boolean network has a unique next state, once a path reaches a loop, it 
never leaves it. Therefore, we can determine the presence of a loop simply by checking 
whether the last state of the path occurs at least twice. 

Clearly, all states between any two occurrences of the last state belong to a loop. A 
loop corresponds to an attractor. We mark all attractor's states. In the following itera- 
tions, we will only search for paths in which the last state is not marked. 

Until at least one attractor remains unmarked, we can find a path of any length 
since we can cycle in an attractor forever. However, once all attractors are identified 
and marked, we will only be able find paths which are shorter than a given length (at 
most the diameter of the STG). So, when we search for a path of some length k and it 
does not exist, this means that all attractors are already identified and the algorithm can 
terminate. 

If a path of length k does exist and it is loop-free, we double k and continue the 
search for a path of the new length. 

We illustrate the algorithm on the example of the Boolean network in Figure 1 . The 
algorithm starts from searching for a path of length k = 3. Suppose that the path we 
found is 111 ^ Oil — ► 000. Since the last state (000) occurs only once, this path is 
loop-free. We increase A: to 6 and continue the search for a path of length 6. Suppose 
that the path we found is 110 — ^ 101 — > 010^ 101 — > 010 -> 101. Now, we can see that 
(101,010) is a two-state attractor and mark it. The following search for a path of length 
6 may return us the path 01 1 -> 000 -> 001 -> 000 -> 001 -> 000. Again, we mark 
(000,001) is a two-state attractor. The next search shows that there exist no more paths 
of length 6. We conclude that all attractors are identified and terminate the algorithm. 

As another possibility, while searching for a path of length k = 3 we may find a path 
001 — ► 000 — ► 001 . In this case, we mark (000, 001 ) is a two-state attractor and continue 

1 The use of two types of edges is common for describing Boolean networks in spite of the fact 
that it is redundant since the type of regulation is fully denned by the associated functions. 
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the search for a path of length 3. Next we may find the path 010 ^ 101 — > 010, and 
mark it. The following search will show that there exist no more paths of length 3 and 
algorithm will terminate. 

As we can see, the presented algorithm may terminate either before or after the 
depth of unfolding becomes equal to the diameter of the STG. 

4 Description of the Algorithm 

The pseudocode of the presented algorithm is given as Algorithm 1 . We use a SAT- 
solver for identification of paths of a particular length k in the STG of a Boolean net- 
work. First we generate a propositional formula F representing an unfolding of the 
transition relation T of the network for k steps. A satisfying assignment to this proposi- 
tional formula corresponds to a valid path in the STG. The process is repeated iteratively 
for larger and larger values of k until all attractors are identified. 

4.1 Initial unfolding 

Given a Boolean network with n vertices and the transition relation T, the algorithm first 
unfolds the transition relation k times, where k = min(n, 100). We empirically found it 
more time-efficient to unfold the transition relation directly by n steps for small net- 
works of size n < 100. For large networks with n > 1000, unfolding by n steps might 
take too much memory and it is usually unnecessary for identification of all attractors. 
This is justified by some specific features of gene regulatory networks which we de- 
scribe in Section 5. 

4.2 Direction of unfolding 

In the pseudocode, we use Tp,„ r to denote the transition relation T which is unfolded 
from the time step p to the time step r, i.e. 

r-l 

Tp„. r = A T ( s U s i+l)- 

i=P 

where Sj denotes the state of a Boolean network at the time step i. 

One specific feature of our algorithm is that we always unfold T from some time 
step —p to the time step so that the previous time frames rather then the next ones are 
added to the unfolding. The depth of the unfolding is increased by decreasing —p. In 
this way, the last state of the unfolded transition relation is always so, independently of 
the depth of the unfolding. Later we will explain how this helps us to reduce the number 
of additional clauses which are needed to make paths loop-free. 

4.3 Identification of paths 

Once the transition relation is unfolded, a SAT-solver is called to find a satisfying as- 
signment for the resulting propositional formula F. The function Sat in the pseudocode 
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Algorithm 1 An algorithm for computing attractors in a Boolean network with n ver- 

tices and the transition relation T. 

;' = n 

attractor_is_found = False 

A (so) = /* A (so) is the set of states of all attractors expressed in terms of variables of so */ 
F = 71; o /* F is the propositional formula representing the unfolding 71/...0 */ 
while Sat(F) do 
for {j = -\;j> do 
if Si = so then 
for (k = 0-k> j;k--) do 

A (so) = A(so) U (so «-» cj) /* Ck £ {0, 1}" is an assignment of the variables */ 
/* of Si returned by the SAT-solver; so «-> q. is defined by (2) */ 

end for 

F = FA-A(s ) 
break 
end if 
end for 

if attractor_is_found then 

attractor_is_found = False 
else 

F = F A r_2*,\„_j 
i = 2*i 
end if 
end while 



corresponds to a call to a SAT-solver. Sat takes an expression and returns True if there 
exists an assignment of variables which make the whole expression true. 

If a satisfying assignment does not exist, this means that there is no path of length ; 
in the STG. This implies that all attractors have been already identified and marked in 
the STG, so the algorithm terminates. 

4.4 Checking paths for loops 

If a SAT-solver finds a satisfying assignment, the algorithm checks whether there is a 
loop in the path corresponding to this assignment. As we mentioned before, we can 
determine the presence of a loop by checking whether the last state of a path occurs at 
least twice. Since in our case the last state of any unfolded transition relation is so, to 
identify an attractor, it is sufficient to check weather so occurs at least twice on a path. 

4.5 Adding restrictions to F 

If Sj — so for some j € {—2, . . . , —2,-1}, then we can conclude that we found an attrac- 
tor of length j. In this case, each of the attractor's states is added to the a characteristic 
function A (so ) which represents the set of states of all attractors expressed in terms of 
variables of so. 
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Benchmark 
name 


vertices 
n 


Attractors 
number x length 


BDD-based [7] 


SAT-based 


sec 


MB 


sec 


MB 


unfolding depth 


Arabidopsis thaliana 


15 


10 x 1 


0.077 


19.14 


0.035 


1.76 


15 


Budding yeast 


12 


7x 1 


0.109 


19.82 


0.046 


1.91 


24 


Drosophila melanogaster 


52 


7x 1 




> 1000 


0.093 


2.32 


52 


Fission yeast 


10 


13 x 1 


0.062 


19.04 


0.030 


1.78 


10 


Mammalian cell 


10 


1x1,1x7 


0.060 


19.04 


0.028 


1.76 


10 


T-cell receptor 


40 


8x 1,1x6 


0.093 


19.34 


0.030 


1.98 


40 


T-helper cell 


23 


3 x 1 


0.107 


19.61 


0.042 


1.81 


23 



Table 1. Experimental results for the Boolean networks models of real organisms; "-" stands for 
memory blow up. 



The n-bit vector c,- €{0,1 }" is used to denote an assignment of variables of s^ which 
is returned by the SAT-solver. The notation so *-> c& means that 

■so <-> c k = /\ (so [«'] <-> c k [i] ) , (2) 

i=o 

where so[i] is ;th variable of the state so and Ck[i] is ;th bit-position of the vector c,. 

By adding ^A(so) to the propositional formula F we constrain F in such a way that 
any satisfying assignment for F will contain no states of already identified attractors. 
Note that, in our case, it is sufficient to ensure that the state so does not belong to any 
already identified attractor in order to guaranty that no state in this path belongs to an 
identified attractor. Now it becomes evident why we have chosen to make the last state 
of any unfolded transition relation sq. 

5 Experimental Results 

We have implemented an experimental tool based on the presented algorithm. In this 
section, we compare it to the BDD-based tool for finding attractors 2 from [7]. Our 
implementation uses MiniSAT SAT-solver [17]. All experiments were run on a PC with 
Pentium III 750 MHz processor and 256 Mb memory. 

As benchmarks 3 we use existing Boolean networks models of real organisms shown 
in Table 1: control of flower morphogenesis in Arabidopsis thaliana [18], budding yeast 
cell cycle regulation [19], Drosophila melanogaster segment polarity genes expression 
patterns prediction [20], fission yeast cell cycle regulation [21], the mammalian cell 
cycle regulation [22], T-cell receptor signaling pathway analysis [23], and T-helper cell 
differentiation [24]. 

As we can see from Table 1, the presented SAT-based algorithm uses an order of 
magnitude less space than the BDD-based algorithm. For the Drosophila melanogaster, 

2 Both tools are available at http://web.it.kth.se/~dubrova. 

3 At present there is no common set of benchmarks for the GRN simulation tools. We have 
constructed the input descriptions of models shown in Table 1 manually from the data in the 
corresponding papers. 
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the BDD-based algorithm runs out of memory. The memory blow up occurs while try- 
ing to construct the initial transition relation T. 

We can also see that for only one benchmark, budding yeast, the depth of unfolding 
had to be doubled to 2n. For the rest of benchmarks, all attractors were identified after 
the first unfolding by n steps. 

The performance of the presented algorithm is determined by the number and length 
of attractors in a network, as well as to the length of the longest path to an attractor. For 
large networks, we may expect the number of attractors to be considerably smaller 
than the number of vertices n of the network. This is because the number of vertices n 
in a Boolean network corresponds to the number of relevant genes in the organism it 
models, and the number of attractors N a corresponds to the number of cell types of this 
organisms. Different hypothesizes have been made suggesting that, for large Boolean 
networks, N a = 0(^/n) [5] or N a = 0(n 2 / 3 ) [11]. 

As we can see from Table 1, the length of attractors is usually one. This is because 
the states of attractors represent the expression levels of genes in a given cell type, 
which are normally stable [20]. So, we may expect the length of attractors to be a small 
constant for large networks as well. 

Finally, the longest path to an attractor is related to the time which takes a cell 
to settle down into a stable pattern in the process of cell differentiation [5]. For all 
benchmarks in Table 1 this parameter was smaller than «. No empirical results are 
known for larger networks. 

6 Conclusion 

This work is the first step in applying SAT procedures to finding attractors in Boolean 
networks. We believe that the presented approach has a potential to handle several or- 
ders of magnitude larger networks than the ones which can be handled by the BDD- 
based approaches. Unfortunately, existing Boolean models of real organisms are small, 
so they do not allow us to support this claim. Our next step is to work in collaboration 
with biologists on applying the presented tool to create larger models of more complex 
organisms. 
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